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1.  INTRODUCTION 


For  the  past  year,  we  have  performed  a  study  of  the  computer  analysis  of 
1 

time-varying  imagery.  The  original  goal  of  this  study  was  to  develop  tech¬ 
niques  which  locate,  track,  identify  and  characterize  a  single  rigid  target 
moving  in  an  image  sequence  in  which  the  camera  platform  (the  observer)  is 
also  moving.  However,  the  work  which  has  been  accomplished  is  more  than  that 
proposed.  The  requirement  of  a  single  target  is  dropped.  In  addition,  the 
work  on  motion  description  has  been  completed. 

The  first  progress  report  describing  a  motion  detection  algorithm  was 
made  and  submitted  to  ARO  on  June  30,  1982.  This  final  report  will  cover  the 
progress  during  the  period  between  July  1,  1982  and  January  31,  1982. 

A  paper  [1]  describing  a  developed  motion  detection  algorithm  was 
published  on  the  Proceedings  of  the  6th  International  Conference  on  Pattern 
Recognition  in  October  1982  (see  Appendix  A). 

The  participating  scientific  personnel  in  this  study  are  Wesley  E. 
Snyder,  Sarah  A.  Rajala  and  I-Sheng  Tang.  Mr.  Tang  completed  his 
doctoral  research  work  in  January  1983.  A  part  of  his  dissertation  [2], 
concerned  with  computer  analysis  of  motion  in  time-varying  imagery  containing 
multiple  rigid  moving  objects,  is  included  in  Appendix  C. 

2.  RESEARCH  PROBLEM 

The  original  goal  of  this  study  was  to  develop  techniques  which  locate, 
track,  identify,  and  characterize  a  single,  rigid  target  moving  in  an  image 
sequence  in  which  the  camera  platform  is  also  moving.  The  specific  research 
tasks  are  as  follows: 

1.  Obtain  a  realistic  data  base; 

2.  determine  the  optical  flow; 

3.  process  the  optical  flow,  and 

4.  develop  a  model  of  the  target. 
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Items  1-3  have  been  completed  successfully  within  the  stated  assumption. 

Item  4  has  been  addressed,  and  progress  made  toward  a  solution.  In  addition, 
1 

it  has  been  possible  to  relax  some  of  the  assumptions  stated  in  the  proposal. 

The  requirement  of  dealing  with  a  single  moving  target  has  been  dropped; 
multiple  moving  targets  (up  to  3  targets)  is  manageable  by  the  developed  tech 
niques.  In  addition,  a  near-natural  language  description  of  the  motion  of 
each  moving  target  has  been  developed. 

3.  DATA  BASE 

Primarily,  there  are  two  classes  of  time-varying  imagery  for  this  study. 
The  first  class,  those  with  high  detail,  is  exemplified  by  a  real-world  image 
sequence  containing  a  street  scene.  The  second  class,  rapid  motion  but  sim¬ 
ple  background,  is  represented  by  several  laboratory  generated  sequences  of 
radio  controlled  toy  cars  having  different  types  of  movement.  These  image 
sequences  were  used  for  the  study  of  the  problems  under  the  condition  of 
multiple  targets  moving  in  an  image  sequence  with  a  stationary  camera.  The 
second  class  of  time-varying  imagery  is  comprised  of  two  synthetic  image  se¬ 
quences  and  several  real-world  FLIR  image  sequences  (from  Martin  Marietta 
Aerospace,  Orlando,  FL).  These  image  sequences  were  used  for  study  of  back¬ 
ground  motion  under  the  condition  that  a  single  target  is  moving  In  an  image 
sequence  with  a  moving  camera  (sensor). 

These  test  image  sequences  are  shown  in  Appendix  B. 

4.  SUMMARY  OF  RESULTS 

In  this  section,  a  summary  of  the  results  of  this  study  to  data  are 
listed  in  the  following: 
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1.  Moving  target  detection  algorithm 

A  simple  frame  differencing  algorithm  is  followed  by  region 

1 

growing  on  the  thresholded  difference  picture.  The  motion 
areas  are  then  linked  into  a  graph  structure.  Rectangular 
windows  are  placed  around  the  linked  areas .  Each  window  may 
contain  a  single  target,  a  part  of  a  target  (e.g.  when  oc¬ 
cluded  by  a  foreground),  or  more  than  one  target  (e.g.  when 
occluded  by  an  another  target). 

2 .  Window  tracking  and  predicting  algorithm. 

The  moving  target  detection  algorithm  provides  a  set  of  win¬ 
dows  which  are  placed  around  the  targets.  In  this  algo¬ 
rithm,  a  basic  set  of  mapping  rules  are  developed  to  handle 
the  tracking  from  frame-to-f rame .  In  addition,  a  set  of 
rules  are  used  to  correct  for  imprecise  windows  due  to  jit¬ 
ter  and  time-varying  noise.  Another  set  of  rules  are  also 
developed  to  predict  target  locations  when  occluded  or  when 
a  window  containing  two  targets  must  be  split.  This  algo¬ 
rithm  is  capable  of  handling  a  variety  of  situations,  parti¬ 
cularly  in  the  occlusion  problems. 

3 .  Corner  matching  algorithm. 

A  corner  matching  algorithm,  using  relaxation  labeling,  to 
derive  the  direction  of  motion,  in  the  2-D  sense,  was 
developed.  This  algorithm  is  capable  of  searching  for  cor¬ 
responding  corners  of  a  target  in  a  pair  of  consecutive 
frames  even  when  partial  occlusion  occurs. 
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4.  Motion  description  output. 

A  capability  of  near-natural  language  description  of  the  mo- 
tion  is  implemented.  Qualitative  description  of  the  tar¬ 
get's  motion  and  relationships  between  targets  in  an  image 
sequence  is  provided  through  processing  of  the  results  of 
the  low-level  image  analysis. 

5.  Extraction  of  moving  targets. 

A  procedure  was  developed  for  extracting  images  of  moving 
targets.  Approximation  of  a  target's  image  can  be  acquired. 

This  result  will  provide  a  further  study  of  target  descrip¬ 
tion. 

6 .  Background  motion  estimation. 

In  the  case  of  a  moving  camera  (sensor)  producing  an  image 
sequence,  'the  apparent  background  motion  reflects  the  sensor 
motion.  If  apparent  background  motion  can  be  analyzed,  then 
the  analysis  can  be  coupled  with  a  camera  model  to  provide 
ground  topology  or  sensor  platform  motion.  In  addition, 
knowledge  of  the  background  motion  can  make  frame-to-frame 
registration  possible.  The  algorithms  developed  for  a  sta¬ 
tionary  camera  then  applicable  for  a  processed  image  se¬ 
quence  generated  by  a  moving  camera. 

A  transform-based  approach  (see  Appendix  D)  is  developed  to  estimate 
background  motion  in  a  FLIR  image  sequence.  The  calculation  of  sensor  motion 
parameters  is  currently  being  investigated.  The  current  algorithm  is  appli¬ 
cable  for  translational  sensor  motion  only. 
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5.  CONCLUDING  REMARKS 

We  have  accomplished  the  proposed  research  on  a  number  of  aspects  of  the 
motion  analysis  problem  in  time-varying  imagery*  The  developed  techniques  are 
capable  of  locating,  tracking,  identifying,  and  characterizing  a  single,  rigid 
target  moving  In  an  image  sequence.  With  moving  camera,  a  transform-based 
algorithm  is  capable  of  estimating  the  background  motion.  However,  incor¬ 
porating  the  information  of  background  motion  with  the  developed  techniques 
under  the  condition  of  a  stationary  camera  remains  for  further  investigation. 

In  addition  to  the  accomplished  work  stated  in  above,  the  capability  of 
handling  more  than  one  target  in  the  developed  techniques  and  providing  near¬ 
natural  language  output  in  the  developed  system  have  been  accomplished. 

The  accomplished  work  has  concentrated  on  motion  analysis.  A  further  ex¬ 
tension  of  this  work  might  focus  on  target  description,  incorporating  motion 
information.  Another  possible  extension  would  be  the  investigation  of  the 
feasible  of  hardware  implementation,  described  in  the  next  paragraph,  and  the 
appropriate  system  architecture. 

The  Implemented  moving  target  detection  algorithm  can  be  modified  to  per¬ 
form  the  differencing  and  region  growing  steps  in  a  parallel  computation. 

These  two  major  bottlenecks  might  be  removed  by  a  hardware  implementation  of 
the  modified  algorithm.  Another  possibility  in  hardware  implementation  is  the 
window  tracking  algorithm.  Since  a  window  is  simply  represented  by  a  quad¬ 
ruple  (left,  right,  top,  bottom),  this  algorithm  has  potential  to  be  imple¬ 
mented  In  hardware. 

We  feel  the  accomplished  work  is  adequate  for  providing  the  basis  of  the 
aforementioned  future  research.  A  proposal  details  the  future  research  has 
been  submitted  to  the  ARO. 

best  available  copy 


APPENDICES 


APPENDIX  A 

"Extraction  of  Moving  Objects  in  Dynamic  Scenes” 


by 

I-Sheng  Tang,  Wesley  E.  Snyder  and  Sarah  A.  Rajala 


BEST  AVAILABLE  COPY 


EXTRACTION  OF  MOYINQ  OBJECTS  IN  DYNAMIC  SCENES* 


I-Sheng  Tang,  Wesley  E.  Snyder  and  Sarah  A.  Rajala 


Department  of  Electrical  Engineering 
North  Carolina  State  University 
Raleigh,  North  Carolina 


Abstract 


This  paper  presents  a  method  of  Identifying 
the  Images  of  moving  objects  In  real  world  dynamic 
scenes  where  both  the  moving  objects  and  the  back¬ 
ground  are  nonhomogeneous.  We  show  that  region 
growing  on  a  thresholded  difference  picture  follow- 
*  by  linking  neighboring  regions  determines  a  win- 
m  around  the  Image  of  each  moving  object.  Once 
the  ‘windows  have  been  determined,  we  can  separate 
Images  of  moving  objects  from  stationary  scene  com¬ 
ponents'  using  a  simple  pixel-based  process.  Some 
refining  processes  are  discussed,  and  some  experi¬ 
ments  are  demonstrated. 

■  1.  Introduction 

The  existing  schemes  for  extracting  the  im¬ 
ages  of  moving  objects  from  dynamic  scenes  encoun¬ 
ter  difficulties  whan  the  scenes  contain  nonhomo¬ 
geneous  moving  objects  and  a  nonhomogeneous  back¬ 
ground  and' the  contrast  between  objects  and  their 
surrounding  background  Is  low. 

Earlier  efforts  3*8  are  primarily  to  analyz¬ 
ing  objects  having  translational  movement  and  a 
sufficiently  contrasting  background.  Jain  et  al.  * 
•itlllze  the  properties  of  first  and  second-order 
ifference  pictures  to  extract  the  Image  of  a  mov¬ 
ing  object  which  may  have  simultaneous  translation¬ 
al  and  rotating  movements.  However,  the  restric¬ 
tion  of  a  sufficient  contrast  against  the  station¬ 
ary  scene  component  has  not  been  alleviated. 

If  the  contrast  1$  well  defined,  classify¬ 
ing  regions  In  a  difference  picture  and  using  re¬ 
gion  growing  and  region  decaying  processes  to  ex¬ 
tract  the  Images  of  moving  objects  5  can  tie  used. 
However,  a  recent  report  ®  shows  that  the  classifi¬ 
cation  algorithm  1$  not  successful  In  handling  an 
Image  sequence  with  a  low  contrast,  nonhomogeneous 
background  end  nonhomogeneous  objects. 

In  this  paper,  we  describe  an  approach  for 
extracting  the  Images  of  moving  objects  under  the 
conditions  of  nonhomogeneous  moving  objects,  non¬ 
homogeneous  background  and  poor  contrast  between 
objects  and  background. 


*  This  work  was  supported  by  the  Army  Research 
Office  under  Grant  DAAG-2'9-82-K-Q070. 


2.  Assumptions  About  Dynamic  Scences 

We  first  describe  some  assumptions  about 
the  analyzed  dynamic  scenes.  This  will  also  Indi¬ 
cate  some  of  the  limitations  of  the  proposed  algo¬ 
rithm. 

We  first  assume  that  the  direction  of  Il¬ 
lumination  Is  fixed  so  that  variations  of  grayvalue 
at  a  fixed  pixel  position  are  mostly  due  to  motion. 
We  assume  that  the  television  camera  is  stationary 
so  that  spatial  coordinates  of  images  are  fairly 
well  aligned.  No  radical  change  in  the  shape  and 
position  for  the  same  moving  -object  in  any  pair  of 
consecutive  frames  is  allowed.  We  assume  the  im¬ 
ages  of  moving  objects  have  motion  over  at  least 
several  pixels  per  frame  In  at  least  one  direction. 
This  will  restrict  our  attention  to  objects  which 
are  not  slowly  moving.  The  images  of  moving 
objects  should  also  have  a  reasonably  large  size. 
Although  we  assume  the  condition  of  poor  tontrast 
between  the  Images  of  moving  objects  and  the  back¬ 
ground,  we  should  mention  that  “poor  contrast* 
means  difficulty  for  automated  edge  detection. 

3.  Description  of  the  Algorithm 

We  approach  this  problem  In  two  major  steps. 
First  we  try  to  place  a  window  around  an  Image  of 
the  moving  object.  Then  we  extract  the  Image  of 
the  moving  object  within  the  window. 

The  algorithm  for  placing  a  window  around  an 
image  of  a  moving  object  Is  the  following: 

1.  generate  a  difference  picture  (DP), 

2.  generate  a  thresholded  difference  pic¬ 
ture  (TOP)  with  a  threshold  value  (THD), 

3.  segment  TDP  Into  regions  (STOP),  and, 

4.  link  neighboring  regions.  In  which  each 
region  contains  pixels  above  a  thres-, 
hold,  then  place  a  window  around  each* 
moving  object  and  delete  small  windows. 

The  first  step  is  simply. the  subtraction  of 
the  gray  level  at  each  pixel  position  in  one  frame 
with  respect  to  the  corresponding  pixel  in  the 
other  frame  followed  by  taking  the  absolute  value 
of  the  result  of  subtraction. 

Next,  the  DP  is  thresholded.  All  the  values 
less  than  THD  are  set  to  zero. 
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,  The  above  two  steps  are  essentially  a  high- 
pass  (temporal)  filtering  process.  Recause  we  sup¬ 
press  low  frequencies  represented  by  slow  changes 
In  gray  level  at  the  same  pixel  position  between 
two  consecutive  frames,  the  analyzed  dynamic  Image 
sequence  can  not  allow  Images  of  very  slow  moving 
objects. 

Me  then  apply  a  region  growing  technique  10 
on  TOP  In  step  2.  Generally,  the  largest  region 
happens  to  be  the  stationary  component  plus  homo¬ 
geneous  parts  of  the  images  of  the  moving  objects. 
This  region  will  be  dlsgarded  in  step  4.  Because 
of  noise  from  image  recording  processes  and  Insuf¬ 
ficient  evidence  of  the  existence  of  a  moving  ob¬ 
ject,  regions  with  a  small  number  of  pixels  will 
also  be  dlsgarded. 

In  the  last  step,  we  use  a  single-linkage 
algorithm  2  nn$;  neighboring  regions  together. 
Me  represent  each  region  as  a  point  in  a  two- 
dimensional  image  plane  at  the  center  of  gravity  of 
Its  extremes.  The  minimum  distance  between  clus¬ 
ters  K1  and  K2  Is  measured  as 

0m1n(Kl,K2}°min||Xl-X2(|,  where  XlcKl  and  X2eK2. 

The  algorithm  will  be  terminated  when  Dmin 
between  nearest  clusters  exceeds  a  threshold.  The 
results  can  be  thought  of  as  partitioning  a  mini¬ 
mal  spanning  tree  by  breaking  aqy  edge  with  length 
•over  a  threshold. 

The  segmented,  thresholded  difference  pic¬ 
tures  (STDPs)  have  a  large  number  of  regions  loca¬ 
ted  In  the  overlapped  interior  of  the  Images  of  the 
moving  objects,  while  homogeneous  Images  of  moving 
objects  do  not  have  this  same  property.  Thus,  a 
single-linkage  algorithm  is  sufficient.  If  the  im¬ 
ages  of  moving  objects  are  not  close  to  each  other, 
then  it  Is  not  likely  that  a  given  window  will  con¬ 
tain  two  or  mare  moving  objects.  Windows  should  be 
conservative,  so  the  extremes  of  the  linked  regions 
In  the  x  and  y  direction  are  used  to  form  windows. 

The  algorithm  described  above  can  be  under¬ 
stood  more  easily  through  the  use  of  some  simple 
examples.  For  simplicity,  assume  the  background 
environments  are  the  same  for  images  of  a  moving 
object  In  frame  1  and  1+1.  Then  for  a  homogeneous 
region  In  each  frame  representing  the  Image  of  a 
moving  object  having  a  translational  movement  par¬ 
allel  to  their  edges  generates  two  regions  In  the 
difference  picture  (refer  to  figure  1).  However, 
more  regions  can  be  generated  in  the  difference 
picture  of  nonhomogeneous  Images  of  a  moving  object 
(e.g.  five  regions  constitute  the  image  of  a  moving 
object  in  figure  2). 

Figure  3  shows  that  the  proposed  algorithm 
works  well  If  some  regions  constitute  an  image  of  a 
moving  object  having  poor  contrast  against  the 
background.  In  figure  3,  two  regions,  one  in  the 
upper  right  and  the  other  In  the  lower  left,  do  not 
have  sufficient  contrast  against  their  backgrounds. 
Using  the  algorithm  described  above,  the  determined 
window  still  contains  the  Image  of  the  moving  ob¬ 
ject. 
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The  above  algoritt  can  be  applied  ’for  eac 
pair  of  consecutive  frame  '  through  the  whole  i-jg 
sequence,  resulting  in  tree  detection  o'  notion 
The  next  step  Is  the  extraction  of  the  Image  of  tljf 
moving  object  within  each  window. 

In  the  poor  contrast  situation,  the  extrac 
tlon  Is  not  reliable  using  only  a  single  frame  dl 
ference  pair.  Information  must  be  accumulated  ove 
time.  Knowing  that  the  Image  of  a  moving  objec 
only  temporarily  occupies  a  specific  region  in  th 
picture,  we  can  extract  the  moving  object  utlllzin 
Information  derived  from  a  partial  .  image  sequenc 
In  which  the  moving  object  Is  not  located  wlthl 
the  corresponding  windows.  That  Is,  having  detec 
ted  motion  within  a  window,  we  search  for  a  set  o 
frames  In  which  the  moving  object  has  left  tha 
window.  From  those  frames,  we  can  accurately  model: 
the  background  and  thus  extract  the  moving  objec 
precisely. 

Two  pixel-based  processes  for  extractin 
moving  objects  are  used.  Suppose  we  have  n  frames 
say  from  j  to  j+n-1.  Further,  suppose  the  movln 
object  is  not  located  within  the  corresponding  win 
dow  In  frame  1,  the  desired  frame  for  extractin 
the  Image  of  the  moving  object.  The  n  absolut 
difference  pictures  between  frame  1  and  frame  j 
through  j+n-1  are  formed.  At  each  pixel  position 
the  number  of  these  n  difference  picture- 
value  exceeds  a  threshold  Is  counted.  Ik 
count  exceeds  a  threshold  p  where  0£p£n,  then  a  *1 
Is  marked  to  denote  a  moving  object.  Otherwise,  a 
“O"  Is  marked  to  Indicate  background.  Alternately 
we  can  compute  mean  and  standard  deviation  of  gray; 
level  at  each  pixel  position  for  the  n  frames 
Then  we  accept  a  pixel  as  belonging  to  the  image  of 
a  moving  object  If  Its  gray  level  in  frame  1  Is 
outside  some  multiple  of  standard  deviation  around 
the  mean. 


4.  Results 


I 


A  real  world  traffic  scene  is  used  to  test 
the  proposed  algorithm.  There  are  four  moving  ob 
jects  in  the  scene:  'a  black  car  (CAR)  going  east 
a  taxicab  (CAB)  turning  right  and  away,  a  black 
wagon  (WAGON)  going  west,  and  a  person  In  the  left 
upper  corner  walking  down  the  street.  The  black 
cars,  CAR  and  WAGON,  have  poor  contrast  again-  the 
background.  The  images  of  the  moving  objec  are 
nonhomogeneous,  especially  the  CAR  and  WAGON.  The 
image  sequence  also  has  foreground,  a  tree,  In  the 
right  lower  corner  of  the  Image. 

All  the  Images  of  the  PERSON  are  partially 
detected,  except  In  frame  3.  The  detection  failure 
Is  due  to  the  size  of  the  Image  of  the  PERSON  and! 
the  fact  that  the  bottom  part  of  the  PERSON  has  if 
very  lw  contrast  against  Its  background.  The  CAR 
In  each  frame  is  detected;  likewise  the  CAB  is  de 
tected  In  each  frame.  However,  a  larger  segmented 
region  close  to  those  of  CAB  is  mlslinked  in  frame,. 
2  and  frame  4  due  to  noise.  In  detecting  the  WAG-1 
ON,  due  to  the  Interference  of  the  foreground  andlj 
background  the  results  In  frame  7  and  frame  8  aref 
not  satisfactory.  Due  to  limited  space,  we  only 
show  the  results  of  the  window  placing  algorithm 
applied  on  frame  1  and  frame  8  in  figure  4. 
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Table  1  snows  the  coordinates  of  the  windows 
containing  the  i-ages  of  the  moving  objects.  This 
data  indicate  that  the  CAR  and  WAGON  are  proceeding 
in  approximately  translational  movement  and  moving 
Into  the  visual  field.  The  CAB  is  at  about  the 
same  area  except  the  radical  change  in  the  size  of 
the  window  In  frame  2  and  frame  4  Is  due  to  noise. 
The  PERSON  Is  walking  down  In  a  southwest  direc¬ 
tion. 

Figaro  5(a)  and  (b)  are  the  segmented  Images 

of  CAR. 

5.  Discussion 

Our  experiments  show  that  the  algorithm  is 
capable  of  extracting  multiple  Images  of  moving  ob¬ 
jects  In  a  nonhomogeneous  dynamic  image  sequence. 
However,  the  algorithm  is  dedicated  to  handling  the 
situation  of  insufficient  edge  information.  Other¬ 
wise,  the  approach  in  [5]  Is  preferred.  This  pe¬ 
ripheral  process  7  in  motion  analysis  will  direct 
the  attention  of  higher  level  processes  and  reduce 
the  computational  burden. 

We  can  refine  our  results  (e.g.  the  windows 
ntaining  the  CAB  In  frame  2  and  frame  4)  using 
.,ie  assumption  that  no  radical  change  in  the  shapes 
and  positions  of  the  Images  of  moving  objects  can 
occur.  A  more  elaborate  method  such  as  the  match¬ 
ing  of  regions  1*9  in  the  corresponding  windows  may 
improve  the  result. 

In  the  process  of  moving  object  extraction. 
Isolated  pixels  may  be  deleted  or  added  as  members 
of  an  image  of  a  moving  object  depending  on  their 
connectedness  with  respect  to  their  neighboring 
pixels.  However,  further  study  will  be  required  to 
implement  such  a  technique. 
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Pig.  1  (a)  Frame  1,  (b)  Frame  1+1, 

(c)  DP  generated  from  frame  1 
and  1+1. 


2(a)  2(b). 
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Fig.  2  (a)  Frame  1,  (b)  Frame  1+1, 

(c)  DP  generated  from  frame  1 
and  1+1. 
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(a)  Frame  1,  (b)  Fram 
1+1,  (c)  DP  generated 
from  1  and  1+1. 


Fig.  *1  (a)  Frame  1,  (b)  Frame  8. 


Table  1 
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*(A,B,C,D)  denotes  four  coordinates  of  the 
extremes,  (B,A) ,  (B,C),  (D,A)  and  (D,C), 
of  a  window.  The  first  element,  e.g. 

B  of  (B,A),  represents  y  coordinate.  The 
second  element,  e.g.  A  of  (B,A) ,'  repre¬ 
sent  x  coordinate. 


Fig.  5  Extracted  image  of  CAR  in  frame  8  using  (a)  voting,  (b)  mean  and  standard 
deviation* 
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APPENDIX  B 


Image  Data  Bases 
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APPENDIX  C 

Excerpts  from  Tang's  Thesis 
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represented  by  the  extremes.  Furthermore,  the  center  is 
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rt  is  worth  noting  that  the  symbolic  encoding  of  nu¬ 
meric  values  for  moving  directions  provides  not  only  for  an 

Figure  3.2  Connections  between  object  0BJ1(5)  end 

its  neighbors  in  the  semantic  network  abstract  level  manipulation  in  future  processing,  but  also 


figure  3.3  A  direction  quantiaation 
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ltion  in  the  other  frame,  followed 


Figure  4.3  A  binary  difference  picture 


clusters  exceeds  a  threshold 


Figure  4.8  Detected  moving  objects  in  frame 
of  image  sequence  one 


Figures  4. IS  to  4.18  are  the  results  of  locating  the 


moving  objects  in  frame  32 
sequence  one 


■Tracking  Having  Object* 


rigure  *.24  Detected  moving  object  in  frane  13  .UTl  *tep  deacribad  In  Section  4.2  and  to  track  aovisg 

of  image  aequenc#  three 


jacts  i  thua  providing  motion  information  for  furthar 


IB(hl),  8<i»2),  ... ,  B(i,l))  be  Che  set  of  all  matched  win-  next  frame.  On  the  other  hand,  A(i,il)  — >  B(i,jl) 


B(i,jl),  B(i,j2),  B(i,j3),  and  B(i,j4)  t  B(i),  and  the  syra-  window  in  a  pair  of  consecutive  frames  at  any  direction, 

bol  +  denotes  an  'associated*  operation  of  windows.  The  The  following  rules  based  on  the  propetty  of  consis- 

aapping  of  A(i,ii)  +  A(i,i2)  >  B(i,jl)  means  that  windows  tency  for  guiding  the  tracking  are  listed  below: 

AU,il)  and  A(i,i2)  merge  to  one  window,  B(i,jl),  in  the  (9,  If  _>  3ii,jl)  +  B(i,j2)  and 


Cii*l,ki))»  then  check  the  area  of 


Flowchart  of  tha  tracking  algorithm 


detecting  moving  figure  4.3G  Improved  result  of  detecting  novtng 


Figure  *.34  improved  result  of  detecting  moving 
objects  in  frame  20 


Predicting  Locations  of  Moving  Objects 


'he  Predicting  Algorithm 
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Figure  4.43  Flowchart  of  predicting  location* 
of  moving  object* 


Hatching  Descriptors 


Figure  4.«S  Predicted  locations  of  moving  objects  and  comPutational  complexities? 

in  frame  of  image  sequence  one  Research  in  physiology  and  psychology  relating  to  this 
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The  updat ing  aquations 


Averaging  direction*  over  the  member  in  the  cluster 


Hatched  corner*  on  the  CAB  in  crane  1  (in  Ficure  4.56  Hatched  corners  on  the  a 

■etching  corner*  between  franea  1  and  2)  matching  corners  between 


Matched  corners  on  the  car  in  frame 
(in  Batching  corners  between  frames 
and  13  in  image  sequence  three) 
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An  attempt  is  made  to  classify  windows  based'  on  the 


information.  The  term  'window  type  classification 
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Figure  4.69  Flowchart  of  window  classification 


The  label (s)  of  Che  object (s)  are  stored  in  the  variable 
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events  Pi  and  P2  trigger  the  verification  of  the  first-: 


event  FI.  Similarly,  FI  car.  trigger  the  verification  of  Si 


other  set  of  verifications. 
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events  is  described  objects,  the  objects'  labels,  and  the  frame 


ance"  is  in  the  sense  of  an  object  which  can  ject  in  the  image  boundary ( ies )  Is  represented 
not  be  tracked.  This  may  indicate  an  object  by:  the  number  of  appearances,  the  object's 
which  stops  its  movement.  label,  and  the  frame  times. 

The  information  about  the  disappearance  of  an  P4.  The  appearance  of  "no  confidence*  in  deriving 
object (s)  is  represented  by:  the  number  of  information  of  an  object's  moving  direction: 


This  information  is  derived  from  the  variable  '  bel ,  the  frame  times,  and  the  current  and  next 


05*'’ ''ATI  *****  ■ "  *  . 


■'  wrt  ;:->i  ■  ■  ‘"M's™ 


This  is  accomplished  by  computing  the  differ-  the  two  objects  are  first  enclosed  in  the  same 
ences  of  the  extremes  for  a  window  containing  .  window  is  defined  as  the  time  of  encountering, 
an  object  (which  stoppeo  its  oovemeit)  ar.d  a  The  event  defined  to  be  ‘leaving*  occurs  when 


objects  are  moving  in  opposite  directions 


Figure  4.72  Illustration  of  the  events  of  (a)  encountering,  analyzed  object  during  the  period, 

(a)  leaving,  (c!  encountering-and-leaving 

(d)  passing.  ST,  MT,  and  ET  denote  the  start  4.  Delete  each  object,  listed  in  Step 

time,  midale  time,  and  end  time,  respectively, 

in  the  period  of  observation.  which  does  not  move  in  a  direction 


poc it#  to  the  anelyied  object’*  Bovin? 


obj2  is  moving  across  the  visual  field 
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substitute  s  grayvalue  for  each  pixel  within 


Step  1  is  accomplished  by  subtracting  the  grayvalue  at 


are  blacfc  points.  Figure  5.5  illustrates  the  result  after 

performing  this  st>p  in  figure  5.3.  Figure  5.2  Illustration  of  a  BTDP  (vithou 

well-connected  boundaries) 
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Generally,  since  the  black  points  resulting  fron  per- 


Figure  5.6  Identified  convex  hull  points  in  figure 


Illustration  of  thresholding  the  image 
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Displacement  Field  Calculation  by  the  Motion  Detection 
Transform  with  Applications  in  FLIR  Imagery' 

by 


Margie  Groves 
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ABSTRACT 

Our  research  examines  a  new  transform-based  technique 
for  calculating  the  displacement  field  of  a  dynamic  sequence 
generated  by  a  single  moving  optical  sensor.  The  transform, 
based  on  the  one  dimensional  Fourier  transform,  computes 
displacement  vectors  from  raw  positional  intensity  data. 
Variations  on  two  existing  methods  of  analysis  of  displace¬ 
ment  fields  are  employed  to  extract  additional  information. 
We  use  a  variation  on  Holben's  MTI  (Moving  Target  Identifi¬ 
cation)  algorithm  to  effect  f rame-to-f rame  registration; 
and,  we  adapt  a  technique  due  to  Nagel  for  calculation  of 
the  parameters  of  sensor  motion. 
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1.0  Introduction. 

*lhis  paper,  first,  supplies  background  on  terminology 
and  on  notation.  Following  these  preliminaries,  the 
transform  employed  in  our  research  is  described,  and, 
results  with  synthetic  data  are  cited.  Next,  in  section  4, 
the  paper  exhibits  the  problems  of  applying  the  transform  in 
real  life  contexts  and  gives  our  solutions  to  these  prob¬ 
lems.  Section  4  also  develops  this  use  of  the  transfona  in 
formation  of  displacement  fields.  Finally,  the  last  two 
sections  incorporate  the  displacement  field  computations 
into  our  chosen  applications:  spatial  registration  and  com¬ 
putation  of  sensor  motion  parameters.  The  first  of  these 
two  sections  provides  results  on  FLIP  data. 

2.0  Background. 

This  section  clarifies  our  terminology  as  it  applies  to 
the  current  topic. 

'Dynamic  imagery'  usually  refers  to  a  time  sequence  of 
photographs.  However,  we  limit  our  study  to  dynamic  imagery 
generated  by  a  single  moving  optical  sensor  (such  as  a  cam¬ 
era  mounted  on  an  RPV).  All  subsequent  discussion  refer¬ 
ences  imagery  gathered  by  a  monocular  optical  system  in  mo¬ 
tion  unless  otherwise  specified. 

> 

When  a  camera  takes  photographs  as  it  moves  past  a  non- 
homogeneous  terrain,  the  picture  plane  (2D)  projection  of  a 
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three-dimensional  texture  point  changes  picture  plane  posi¬ 
tion  between  frames.  If  we  draw  a  vector  from  the  projec¬ 
tion  of  every  texture  point  in  frame  i  to  its  new  position 
in  frame  (i+1)  we  generate  a  displacement'  field.  Thus  the 
displacement  field  shows  how  the  image  of  each  texture  point 
moves  in  the  picture  plane  between  frames. 

Registration  of  two  temporally  adjacent  frames  of  a 
dynamic  sequence  aligns  new  and  old  positions  for  all  (mo¬ 
tionless)  texture  points.  The  absolute  frame  difference 
(|f2[i,j]  -  f 1 1 i r  j  J |  for  all  i,j)  of  two  perfectly  re¬ 
gistered  frames  of  a  dynamic  sequence  contains  zeros  every¬ 
where  except  in  the  vicinity  of  contrasting  objects  moving 
relative  to  the  background.  We  refer  to  the  pixels  in  (re¬ 
lative)  motion  as  target  pixels.  In  actuality,  the  system 
encounters  noise  and  changes  in  the  intens i ty-to-grey  scale 
mapping.  Additionally,  perfect  registration  of  two  frames 
of  a  dynamic  sequence  is  a  yet  unsolved  problem,  especially 
in  the  absence  of  range  data.  V7e  adopt  conventional  termi¬ 
nology  and  term  high  intensity  pixels  not  due  to  moving  ob¬ 
jects  as  false  targets.  (Any  dc  (constant)  pixel  intensity 
can  be  discounted  as  due  to  change  in  grey  scale  mapping.) 


*«r  I'awwiMrTwr'M ,  ,-i.««.-.ri»rw.„ -  .^!rv-ni^rr ... 


2.1  Assumptions. 
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This  section  defines  out  assumptions  as  to  data  con¬ 
tent  and  character. 


V7e  assume  our  data  has  the  following  characteristics: 


It  is  passively  sensed  data,  either  optical  or  thermal. 

It  represents  a  time  sequence  produced  by  a  moving  sen¬ 
sor  whose  motion  can  be  characterized,  at  least  for 
short  periods  of  time,  as  having  a  constant  velocity. 

A  high  enough  frame  rate  is  used,  ie,  sufficient 
"inter-  frame  overlap"  exists  so  that  registration  is  a 
meaningful  process  (with  "sufficient"  as  yet  unde¬ 
fined  )  „ 

(Projections  of)  objects  possessing  velocities  (with 
respect  to  the  background)  occupy  a  small  portion  of 
the  image  plane. 


We  assume  no  fore-knowledge  of  specific  image  contents. 
Additionally,  we  assume  that  the  sensor  viewing  angle  is  low 
oblique  (ie,  the  horizon  does  not  appear  in  the  images)  thus 
we  have  no  cues  from  which  to  calculate  viewing  angle. 


In  addition,  we  wish  to  be  able  to  tolerate  poor  image 
quality.  For  example,  the  real-world  data  we  employ  is  not¬ 
ably  undersampled  FLIP  with  a  low  SNR. 

Given  these  conditions,  feature  matching  approaches  to 
registration  are  impractical,  as  are  any  other  techniques 
dependent  on  local  image  characteristics. 


2.2  Notation 


Our  notation  is  similar  to  that  in  Nagel  (Nl,N2,N3). 
The  following  text  summarizes  this  notation. 

World  coordinates  appear,  herein,  as  (X,Y,Z).  Lower 
case  qualifiers  signify  a  particular  texture  point.  For 
example,  (Xm,Ym,Zm)  designates  the  mth  point  in  world  coor¬ 
dinates.  The  camera  system  coordinates  appear  as  (XC,f,ZC) 
(C  -  modifier).  The  term  "focal  length"  refers  to  the  so- 
called  effective  focal  length  of  the  imaging  system;  and,  f 
is  used  to  represent  this  value.  Camera  coordinate  system 
has  its  origin  located  at  the  lens  center  and  the  point 
(0,f,0)  at  the  base  of  the  focal  axis.  We  use  a  camera  coor¬ 
dinate  system  with  YC  directed  along  the  focal  axis,  ZC 
pointing  "up"  and  XC  pointing  to  the  "right",  defining  a 
Pi  ano  parallel  to  the  image  plane.  The  letter  'P'  indicates 
picture  plane  coordinates,  such  as  (XP,f,ZP).  Image  plane 
coordinates  are  closely  related  to  camera  coordinates.  The 
relationship  between  camera  coordinates  and  picture  plane 
coordinates  for  the  ith  point  is 

(XCi , YCi , ZCi )  =  si (XPi , f , ZPi ) , 

where  si  is  a  scale  factor  such  that  f*si  =  YCi.  Note  that 
camera  coordinates  retain  relative  depth  information  for 
each  texture  point,  that  is,  they  yield  3D  structure.  Pic¬ 
ture  plane  coordinates,  hov/ever,  lose  relative  depth  infor¬ 
mation,  retaining  only  the  visual  direction  to  each  texture 
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point.  The  world  coordinate  system  is  static  throughout  the 
image  sequence  while  the  camera  coordinate  system  changes, 
with  respect  to  the  world  coordinate  system,  from  frame  to 
frame.  Device  coordinates,  or  pixel  addresses,  appear  as 
(XD,ZD)  (D  -device-  modifier). 

Lower  case  coordinates  (x,y)  are  employed  as  a  generic 
2D  system. 

Ve  choose  central  projection  as  our  model  of  the  image 


formation  process. 
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3.0  TheJ Motion  Detection  Transform. 

The  motion  detection  transform  (MDT)  combines  the 
Exponential  Area  Transform  (ExpAT  -  defined  below)  with  an 
FPT  (fast  Fourier  transform)  to  extract,  from  a  dynamic 
sequence,  the  component  of  image  plane  velocity  (in  pixels 
par  frame)  in  either  spatial  dimension.  See  (Figure  1)  for 
the  mathematical  formulation  of  the  MDT.  Two  clarifications 
of  the  MDT  are  in  order.  First,  the  transform  computes 
movement  parallel  to  some  axis.  Secondly,  two  applications 
of  the  MDT  supply  the  two  components  of  the  2D  velocity,  one 
application  tor  each  of  the  coordinate  axes.  References  (Rl 
and  R2)  provide  the  complete  theoretical  basis  for  the 
transform.  The  following  text,  presents  c.  somewhat  intuitive 
explanation . 

Let  t  be  the  number  of  frames  in  a  time  sequence  gen¬ 
erated  by  a  stationary  camera.  For  frame  one,  imagine  an 
homogeneous  nxn  frame  containing  a  single  contrasting  two- 
dimensional  object.  For  simplicity,  assume  the  background 
intensity  is  zero,  and  that  the  object  occupies  a  single 
pixel  and  has  unit  intensity.  The  steps  of  the  MDT  for  the 
sequence  follow.  Project  the  plane  onto  (say)  the  x  axis 
(ie.  sum  across  the  columns).  This  yields  a  vector  whose 
entries  are,  identically,  zero  excepting  at  the  projection 
of  the  object.  Multiply  the  ith  component  of  this  vector  by 
an  exponential,  exp( -2*j *k*i/t } ,  and  sum  the  m  products. 


The  resulting  sum  equals  exp  (-2* j  *k*i/t ) .  Let.  the  (ID) 

object  move  unit  distance  with  respect  to  the  y  axis  between 
frames  one  and  two,  and  repeat  the  three  steps  (above;  pro¬ 
ject,  multiply  by  the  appropriate  exponential,  and  sum)  for 

frame  two.  The  result  is  exp(-2*j*k*(i+l) ) .  If  the  object,, 
continues  to  move  one  pixel  per  frame,  and  if  the  three 
steps  are  repeated  for  each  frame  of  the  sequence,  the 
resulting  vector  of  sums  traces  out  a  comp) ex  sinusoid  with 
frequency  k.  If,  however,  the  object  moves  p  pixels  (where 

the  sign  of  p  implies  direction)  between  frames,  the 

sinusoid  has  frequency  p*k.  The  Fourier  transform  of  the 
result  vector  consists  of  a  single  peak  at  frequency  k*p. 
The  result  vector  is  the  ExpAT  of  the  dynamic  sequence  for 
the  x  dimension;  and,  its  Fourier  transform  is  the  MDT.  A 

peak  search  in  the  MDT  domain,  followed  by  division  by  k 
divulges  the  component  of  object  velocity  in  the  x  direc¬ 
tion  . 


Alternatively,  if  each  mxn  frame  contains  some  arbi¬ 
trary  pattern;  but  no  motion  occurs  during  the  sequence,  all 
frames  of  the  sequence  are  the  same.  Consequently,  all 
entries  of  the  result  vector  ( ExpAT )  are  identical;  and,  the 
MDT  (Fourier  transform  of  the  result  vector)  consists  of  a 
single  do  (zero  frequency)  peak. 

The  explanation  for  the  more  general  case  of  an  object 
moving  across  an  arbitrary  background  combines  the  two  cases 


V*  t " U<T*IWfl  »WI»  ■>- 1 '  "'«■ 


given  above.  The  sequence  generated  by  pointwise  summation 
of  the  two  sequences  described  above  closely  models  an 
object  in  motion  across  an  arbitrary  background.  Since  the 
steps  of  the  MDT  constitute  a  linear  operator,  their  appli¬ 
cation  to  the  sequence  representing  the  object  moving  across 
an  arbitrary  terrain  is  equivalent  to  summing  the  two  previ¬ 
ously  derived  MDTs.  Thus  the  MDT  of  the  sequence  represent¬ 
ing  the  general  case  consists  of  a  peak  at  frequency  k*p  and 
a  dc  peak. 

Next  consider  the  case  of  moving  sensor,  immobile  ter¬ 
rain.  When  the  sensor  moves  across  a  textured  terrain  with 
velocity  p  pixels  per  frame,  the  image  of  (stationary)  tex¬ 
ture  points  appear  to  move  at  -p  pixels  per  frame.  Refer¬ 
ence  (R2)  shows  that  the  MPT  "maps  pixels  moving  at  the  same 
velocity  into  the  same  location  in  transform  space."  So, 
while  performance  is  demonstrably  better  in  the  stationary 
camera  case,  extrapolation  to  the  translating  sensor  model 
is  straight-forward.  The  degradation  in  performance  in  the 
moving  camera  case  stems  from  the  difference  between  new 
pixels  entering  the  transformation  window  and  the  old  pixels 
leaving  the  window. 
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Figure  Is  MOTION  DETECTION  TRANSFORM  EQUATIONS 
« 

Exponential  Area  Transform 


(1)  Fx(kx,t) 


(2)  Fy(ky,t) 


a-1  b-1 

E  E  f(xfy,t)  exp(-j *2*kx*x/c ) 

x«0  y*0 

kx  —  0^  If  2 f . . . 
0<>=x<a  0<=y<b  0<=t<c 

b-1  a-1 

E  E  f(x,y,t)  exp (-j *2*ky *y/c ) 

y=0  x=0  • 

ky  «*  0,  1 ,  2 , . . . 


where  kx  and  ky  are  weighting  factors  which  determine  both 
the  maximum  velocity  detectable  (without  aliasing)  and  the 
resolution  v/ith  which  velocity  may  be  detected. 


from  (Rl)  and  (R2). 


3.1  Preliminary  Results 


This  section  exhibits  the  highlights  of  our  research  to 
date  on  synthetic  imagery  involving  translating  sensors  and 
translating  objects. 

Results  on  synthetic  images  demonstrate  that  the  tech¬ 
nique  is  effective  both  in  pinpointing  sensor  velocity  and 
in  separation  of  relative  sensor-object  velocity  in  cases 
where  moving  targets  occurred  in  the  images  (See  Figures  2, 

3). 

Synthetic  images  were  produced  with  intensity  ranges  of 
tO— 50 ]  and  [0-255]  (Figures  2,4).  The  procedure  was  applied 
to  several  set  of  images  of  both  categories.  These  simula¬ 
tions  indicate  that  results  depend  on  the  pixel  intensity 
range,  as  well  as  on  such  parameters  as  SNR,  image  resolu¬ 
tion,  and  f rame-to-f rame  grey  scale  changes.  There  is  a. 
simple  reason  for  this  dependency.  The  smaller  intensity 
differences  between  pixels  entering  and  leaving  the  scene 
create  smaller  disturbances  in  the  sinusoid  generated  in  the 
first  stage  of  the  procedure.  Evidently,  the  procedure  may 
benefit  from  image  preprocessing  by  some  function ,  such  as 
log,  which  reduces  the  intensity  range  (and  variance). 

The  simulations  were  run  with  various  numbers  of  frames 


of  several  sizes.  (16x16,  32x32,  64x64  with  8,  16,  32 
frames).  As  might  be  expected,  among  the  tested  variations, 
larger  frame  sizes  and  fewer  frames  produced  the  best 
results.  Again,  less  disturbance  was  introduced  into  the 
sinusoid  (generated  in  step  one  of  the  motion  detection 
transform)  by  the  smaller  ratio  of  new  pixels  to  old  pixels 
within  the  transform  window.  (we  note  that  perspective  dis¬ 
tortion  in  real  images  complicates  this  issue) 


ampli  tude(Yx)  *10f  amp  2  i  tude<¥y)  *10" 

<10.23  1B-5.3S  270.11  380.S2  C.Oi  12.6B  25.35  38.02  50.1 


Figure  2.  Plots:  MDT  results  for  synthetic  image  sequence. 

.  Random  background  intensities  [0,50] .  Block  size 
(64x64)  in  pixels.  Sensor  velocity  (in  pixels 
’  per  frame)  (Vx,Vy)  =  (-2,3).  No  object  motion 
occurred  in  the  sequence.  Upper  plot  for  Vy, 
lower  plot  for  Vx.  Notes  since  kx=ky=l,  the 
frequency  showing  the  peak  directly  gives  the 
velocity. 


•freq  -for 


25.63  9 


■freq  -for 


1.00 


itude(Vx)  *10  sopli tude(Vy)  *10 

,2  22,22  33. HF  M.S?  0.03  JW2  2B.Q  H3.18  57.58 


Figure  3,  Plots:  MDT  results  for  synthetic  image  sequence. 

Random  background  intensities  [0,50].  Block  size 
(64x64)  in  pixels.  Sensor  velocity  (in  pixels 
per  frame)  (Vx,Vy)  =  (-2,3).  Object  intensity 

100. 

Object  velocity  (7,8).  Object  size  (6x6). 

Upper  plot  for  Vy,  lower  plot  for  Vx. 


■frcq  -for 


1.00 


■Freq  -for  k1®  1.00 


e»plitude<Vx)  *10'  enplitudelVy)  *10 

WM  i*N.38  268.M  368.3^  0  13.31  26.61  3R 


Figure  4.  Plots:  MDT  results  for  synthetic  image  sequence 
Random  background  intensities  [0,50],  Block 
size  (64x64)  Sensor  velocity  (in  pixels  per 
frame)  (Vx,Vy)  =  (-2,3).  Object  intensity  50. 
Object  velocity  (7,8).  Object  size  6x6.  Note 
that  peak  due  to  object  not  much  above  the  extra 
neous  peaks.  Upper  plot  for  Vy,  lower  plot  for 
Vx. 


•Proq  -for  k 


■Frcq  -for  k 
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4.0  Use  .of  the  Motion  Transform  in  Cases  of  General  Motion. 

This  section  enumerates  some  of  the  inherent  difficul¬ 
ties  in  applying  the  motion  detection  transform  to  analysis 
of  general  motion  and  presents  our  solution  to  these  prob¬ 
lems.  Our  solution  culminates  in  computation  of  a  sort  of 
displacement  field  for  the  image  sequence. 

Mathematically  speaking,  the  direct  application  or  the 
motion  detection  transform  is  valid  only  when  all  motion  is 
restricted  to  translation  in  planes  parallel  to  the  image 
plane.  With  motion  unrestricted,  the  sensor  can  rotate  as 
well  as  translate  in  depth.  Furthermore,  the  transform  will 
detect  different  apparent  velocities  for  texture  points  at 
different  depths  (perspective  or  motion  parallax  effects). 
However,  we  can,  over  small  regions,  approximate  effects  of 
general  sensor  motion  by  the  effects  due  to  a  translating 
sensor.  In  addition,  we  choose  to  ignore  the  perspective 
e;fiects.  We  partition  frames  into  small  blocks.  We  assume 
that  perspective  effects  are  minimal  over  a  single  block, 
and  can  be  dealt  with  more  easily  at  the  next  (higher)  level 
of  processing.  Thus  spatially  dividing  the  images  into 
blocks  simultaneously  defers  the  perspective  distortion 
problems  and  permits  characterization  of  a  wide  variety  of 
sensor  motions. 


r 
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4.1  Problems  with  FLIR  Data. 

Additional  difficulties  stem  from  the  real-world  data 
we  employ  in  our  testing.  Vie  are  using  FLIR  (forward- 
looking  infra  red)  data  to  test  our  approach.  The  data 
demonstrates  many  corruptions.  For  example,  it  suffers  from 
severe  salt  and  pepper  noise.  In  addition,  the  images  were 
preprocessed  so  that  the  intensities  span  the  complete  range 
{0-2551  in  each  frame.  This  introduces  spurious  frame-to- 
frame  grey-scale  changes. 

We  enhanced  transform  performance  by  preprocessing  our 
images.  To  reduce  salt-and-pepper  noise,  we  applied  a  non¬ 
linear  noise-cleaning  algorithm  (from  pi)  to  each  frame  of 
the  sequence.  The  rule  used  is:  replace  any  pixel  some 
threshold  above  the  average  of  its  4-  connected  neighbors  by 
that  average.  Next,  ve  ameliorated  the  qrey  scale  changes 
due  to  self-  normalization  of  the  images  via  an  histogram 
correction  technique  (Gl).  We  arbitrarily  chose  one  image 
of  the  sequence  as  "ideal,"  V7e  computed  the  cumulative  fre¬ 
quency  distribution  (CFD)  of  this  ideal  image,  and  forced 
the  CFD's  of  all  the  other  images  in  the  sequence  to  approx¬ 
imate  this  ideal  CFD.  (See  Figures  5  a.,  b. ) 

Also  generating  significant  problems,  We  limited  our¬ 
selves  to  8  or  16  frames  of  the  sequence  as  a  temporal  win¬ 
dow.  We  noted  that  the  (remaining)  salt-and-pepper  noise  in 
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the  FLIR  imagery  looks  like  small  objects  moving  with  high 
velocities  to  the  motion  detection  transform.  It  is  obvious 
from  the  mathematical  formulation  of  the  transform  that  the 
higher  the  weighting  factor,  (kx  and  ky )  the  more  sensitive 
the  transform  becomes  to  noise.  Therefore,  we  found  it 
expedient  to  limit  our  weighting  factors  to  values  of  one  or 
two.  This  left  us  with  very  low  velocity  resolution;  since 
the  precision  with  which  a  velocity  can  be  located  is  higher 
for  larger  weighting  factors.  We  needed  higher  resolution. 
We  located  the  peak  frequency,  as  discussed  in  section  3.0, 
and  then  interpolated  (computed  a  first  moment  about  the 
peak  with  the  frequencies  on  either  side  of  the  actual  peak) 
to  enhance  velocity  resolution. 

Another  problem  still  remained:  if  the  terrain  were  vir¬ 
tually  homogeneous,  the  dc  component  (zero  velocity)  could 
mask  the  true  velocity-related  frequency.  We  instituted  an 
ad  hoc  rule  stating  that  no  zero-velocity  peak  would  be 
accepted  unless  it  exceeded  the  next  highest  peak  by  more 
than  some  adjustable  percentage. 


5.0  Use  of  the  MDT  for  F  ame-to-Frame  Registration. 
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This  section  overviews  the  MTI  algorithm  (H2)  then 
describes  our  general  approach  of  spatial  registration  which 
is  based  on  MTI. 

Given  the  conditions  and  goals  stated  above,  we  adapt 
the  basic  MTI  algorithm  for  oat  purposes.  The  MTI  algo¬ 
rithm,  as  defined  by  (H2)  is  as  follows* 


1)  partition  an  image  into  blocks. 

2)  compute  a  displacement  vector  (frame  [i]  to 
frame [i+1])  for  each  block  of  the  image  by 
maximizing  a  grey  level  cross  correlation 
function  (for  each  block)  between  two  frames 
of  the  sequence. 

3)  form  motion  model  based  on  least  squares  fitting 
a  quadratic  equation  (in  XP  and  ZD)  to  the 
displacement  vectors,  with  the  Chi  Sauare 
criterion  employed  to  cull  out  erroneous 
displacement  vectors,  in  a  potentially  multi-pass 
process . 

4)  use  the  motion  model  from  3)  and  frame [i]  of  the 
sequence  to  predict  frame [i+j],  P ( frame [ if j ]) . 

5)  compute  j frame t i+j]  -  P( frame [i+j ]) | 

(as  a  pixel  operation) 


See  (Hi)  for  complete  details. 


Our  adaptation 

1)  partition 

2)  compute  a 
the  image 

3)  apply  a  mu 


of  the  MTI  algorithm  is  as  follows: 

the  image  into  blocks. 

displacement  vector  for  each  block  of 
by  means  of  the  MDT. 

Iti-pass  non-linear  noise-cleaning 
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.  algorithm  separately  to  the  x  and  y  components  of 
the  displacement  vectors  to  eliminate  vectors  due 
to  noise  and  those  due  to  objects  moving  within 
the  scene. 

4)  use  the  set  of  displacement  vectors  from  3)  to 
compute  the  coefficients  of  a  quadratic  (in  XD  and 
ZD)  motion  model. 

5)  MTI  step  4). 

6)  MTI  step  5). 

We  exploit  the  motion  detection  transform  to  compute  cur 
displacement  field.  In  the  manner  of  (Hi),  we  partition 
frames  into  blocks  and  compute  a  displacement  vector  for 
each  block.  However,  the  motion  detection  transform,  rather 
than  cross  correlation,  produces  these  vectors. 


5.1  Results  on  FLIR  Data. 

This  section  describes  our  success  in  application  of 
this  technique  to  FLIR  imagery,  and  compares  the  results 
with  simple  frame  differencing. 

Figure  6  a.  shows  the  initial  set  of  displacement  vec¬ 
tors  computed  in  step  two  of  our  procedure.  Figure  6  b. 
shows  the  same  set  of  vectors  after  several  passes  of  the 
noise  cleaning  process.  Our  system  applies  the  motion  model 
derived  from  the  displacement  vectors  to  one  frame  of  the 
sequence  and  predicts  some  subsequent  frame,  frame  n. 
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Figure  7  a.  shows  the  absolute  frame  difference  image  for 

* 

actual  and  predicted  frames.  Figure  7  b.  gives  the  histo¬ 
gram  of  this  image.  Figures  8  a.  and  8  b.  represent  analo¬ 
gous  data  for  the  frame  difference  image  for  (actual)  frame 
one  versus  (actual)  frame  n.  As  the  target  in  these  images 
occupies  a  small  percentage  of  the  total  image,  the  spread 
of  the  histogram  is  a  good  indicator  of  how  well  the  frames 
have  been  registered,  and  thus,  how  well  the  sensor  motion 
has  been  characterized  by  the  motion  model.  The  histogram 
for  the  predicted  case  compares  favorably  with  the  second, 
simple  difference,  case.  As  the  Figure  7  a.  demonstrates,  a 
simple  thresholding  operation  would  virtually  eliminate 
false  targets  from  the  difference  picture  generated  by  our 
system.  This  is  certainly  not  so  for  the  simple  difference 

picture.  Figure  8  a. 

* 
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Figure  6  a.  Displacement  Vectors.  (Unprocessed)  super 
imposed  on  frame!  of  the  sequence. 


Figure  6  b 
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.  Displacement  Vectors  after  four  passes  of  a 
nonlinear  noise  cleaning  algorithm. 
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Figure  7  a.  Frame  difference  for  motion  model! 

d  [  i  ,  j  ]  »  ahs(frame#l(if j)  -  PREMCTFDS  frame  #n  [  i  ,  j  )  t ) 


Figure  7  b.  Histogram  for  motion  model  frame  difference 
picture.  (see  figure  7a.) 
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Figure  8  a. 


Simple  difference  picture: 

dli,jj  =  ahs  (f  ramell  [i  f  j  ]  -  f rame *n  l  i  ,  j  J  ) 


Figure  8  b.  Histogram  of  simple  difference  picture. 
( see  figure  8  a.) 
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6.0  Ongoing  Work. 

The  /following  text  discusses  extraction  of  sensor 
motion  parameters  in  general  and  describes  .our  adaptation  of 
Nagel's  technique  for  calculating  sensor  motion  parameters 
(Nl ,  N2,  N3 )  in  particular. 

Six  parameters  completely  define  the  motion  of  a  moving 
sensor  between  frames  of  an  dynamic  sequence  (Dl,Nl,Al). 
These  may  be  stated  as  three  angles  of  rotation  and  three 
components  (X,Y,Z)  of  a  translation  vector.  Application  of 
the  translation  vector  moves  the  lens  center  in  frame  2  into 
coincidence  with  its  frame  1  position.  Borrowing  notation 
from  computer  graphics/aerial  photogrammetry  (Dl„Al),  the 
three  angles  express  three  sequentially  applied  rotations 
necessary  to  align  each  axis  of  the  "new"  camera  coordinate 
system  to  its  position  in  the  previous  frame  (Dl,  Al). 
Other  specifications  of  the  information  in  these  parameters 
exist  (eg  P2 ,P3 } . 

It  is  a  well  known  fact  that  the  translation  vector  can 
be  recovered  to  within  a  scale  factor,  given  a  monocu¬ 
lar  sensor  (P2,  Nl ,  N2 ,  N3 ,  Dl ) .  (In  fact,  no  absolute  dis¬ 
tances  can  be  recovered,  only  relative  distances).  The 
rotation  angles  can  be  unambiguously  recovered [1] ,  however, 

tl] Nagel ' s  technique  requires  the  data  satisfy  condi¬ 
tions  given  in  N3. 
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as  long'as  the  relationship  XP:f:ZP  is  maintained  (ie,  in  a 
digital  environment,  we  need  the  focal  length  of  the  optical 
system  in  device  coordinate  system  units,  as  well  as  the 
positional  and  temporal  intensity  information.)  Without  the 
focal  length  of  the  system,  we  essentially  have  the  parallel 
projection  case,  from  which  translation  in  depth  cannot  be 
recovered  and  which  renders  indistinguishable  "an  object 
rotating  by  some  angle,  a,  and  its  mirror  image  rotating  by 
-a."  (Ul).  In  any  case,  perspective  (central)  projection 
more  closely  approximates  the  photographic  process. 

Generally  speaking,  solving  for  sensor  motion  parame¬ 
ters  implies  solution  of  a  system  of  nonlinear  equations  in 
five  unknowns,  or  equivalently,  a  search  through  a  five¬ 
dimensional  space  for  the  correct  set  of  parameters  (Tl, 
Nl ) .  Since  the  system  is  nonlinear,  there  is  no  guarantee 
of  a  unique  solution  (Tl).  In  addition,  two  parameters  (the 
translation)  occupy  infinite  axes  in  this  five-dimensional 
space.  Nagel  has  separated  the  calculation  of  the  transla¬ 
tion  from  that  of  the  rotation  by  means  of  algebraic  manipu¬ 
lation  (Figure  9).  He  uses  a  minimization  approach  to 
solve  for  the  rotation  parameters.  Notice  that  his  transla¬ 
tion  vector  calculations  are  linear  and  that  the  search 
space  is  reduced  to  a  three-dimensional  search  space. 

V?e  use  a  priori  knowledge  to  further  reduce  the  search 
space.  We  know  our  imagery  was  generated  by  an  IR  system 
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side-mounted  on  an  RPV.  Significant  rotation  about  the 
focal  axis  is,  therefore,  deemed  unlikely.  Also,  given  a 
high  enough  frame  rate,  the  rotation  angles  between  frames 
should  be  small.  Therefore,  we  need  only,  search  a  portion 
of  the  remaining  search  space  for  rotation  angles. 

We  compute  the  sum  of  the  absolute  value  of  Nagel fs 
equation  (2  Figure  9)  over  the  entire  image  for  each  point 
in  our  reduced  search  space.  The  minimum  over  the  entire  2D 
search  space  is  chosen  as  the  correct  set  of  rotation 
angles.  (Of  course,  v/e  use  the  SSDA  concepts  of  B1  to  speed 
up  calculations.)  Then  Nagel's  equations  (3  Figure  9)  and 
(1  Figure  9)  deliver  the  translation  vector  (to  within  a 
scale  factor).  The  technique  will  be  applied  to  displacement 
fields  extracted  from  our  IR  imagery  as  soon  as  camera  data 
(ie,  focal  length  in  device  units)  becomes  available. 


Figure  9:  Nagel's  Equations  *  J 

(for  extraction  of  sensor  motion  parameters) 

The  world  coordinate  system  is  static.  The  camera  coordi¬ 
nate  system  travels  with  the  camera,  and  thus,  changes  with 
respect  to  the  world  system  from  frame  to  frame.  We  want  to 
know  how  the  sensor  moves  between  frames.  This  motion  is 
called  relative  sensor  motion.  In  formulating  the  solution 
to  the  relative  sensor  motion  problem,  common  practice 
(Nl ,N2 ,N3 ,A2 ,D1 )  is  as  follows:  in  frame  one  establish  the 
world  coordinate  system  as  coinciding  with  the  (current) 
camera  coordinate  system,  ie.  (Xi,Yi,Zi)  =  (XC.il  ,YCil  ,ZCil ) 
for  all  i.  Then 

Am  =  (Xm,Ym,Zm)  =  (XCml ,YCml  ,ZCml )  gives  the  mth  point 

in  world  coordinates 


Cm2 


(XCm2 ,YCm2 , ZCm2 ) 
(Am  +  T) D 


gives  camera  coordinates  for 
the  point  in  the  2nd  time  frame. 


where  T  is  the  translation  (vector)  between 
world  system  origin  and  camera  system  origin 
in  frame  2.  Cm2'  =  Am  +  T 

and  D  is  a  3  by  3  rotation  matrix  which  aligns 
the  (XC* ,YC' ,ZC' )  axes  with  the  (X,Y,Z)  axes. 

Rmn  =  (XPmn , f , ZPmn )  picture  plane  coordinates 

so  Cm2  =  (sm2)Bm2  where  (sm2)f  =  YCm2. 


1)  T  =  (sm2)Cm2D'  -  (sml)Cml 

2)  ( Cml  x  Cm2D' )  *  ((Cll  x  C12D' )  x  (C21  x  C22D' ) )  =  0 


3)  (C21  x  C22D')  *  ( ( sm2 ) Cm2D'  -  (sml)Cml) 


Note:  First,  equation  (2)  solved  for  the  parameters  of  D 

then  equation  (3)  used  to  solve  for  sm2  in  terms  of 
sml  finally,  T  is  derived  from  (1). 

D  represents  an  3  by  3  matrix,  all  other  upper  case 
entities  represent  3D  vectors.  Lower  case  items  are 
scalers.  Scaler  multiplication  is  represented  by 
concatenation  while  and  "x"  indicate  vector  dot 

and  cross  product,  respectively. 

The  single  quote,  designates  the  transpose 

operator . 


T 


from  (Nl)  and  (N3) 
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